Grammatical Induction and Recognition of the Documentary Form of Records

نویسندگان

  • William Underwood
  • Sheila Isbell
  • Matthew Underwood
چکیده

This paper presents digital curators with a more precise understanding of the concept of documentary form, and how documentary form can be automatically learned from a sample of records of a particular document type. The ability to automatically recognize documentary form enables item description. Item description enables file unit description and this enables automatic series description. This technology can reduce the effort required of an appraisal archivist to assess the value of record series containing a large number of e-records of different documentary forms. It can also provide archivists with earlier intellectual control of accessioned e-record series by providing preliminary scope and content notes for these series. Item descriptions provide additional ways for indexing and searching collections of records. Introduction Among the challenges archivists face in appraising e-records and gaining intellectual control of accessioned e-records is the enormous volume of records and the time it requires to read and understand the content of these records. According to one source, "the Clinton White House generated 38 million e-mail messages (and the current Bush White House is expected to generate triple that number)." [3] Archivists must review presidential records page-by page before they can be disclosed to the public or it is determined that here are restrictions on disclosure. Data collected on declassification review, indicates that a reviewer can review on average one page per minute, or 60 pages per hour. Given 1920 work hours per year, an archivist doing nothing other than review, could be expected on average to review 115,000 pages per year. NARA provides eight archivists to each Presidential Library, one of which is a Supervisory Archivist. Assuming seven archivists reviewing records, and an email with attachments averaging one page in length, they could review about 800,000 email massages per year. It will take 125 years for Presidential Library archivists to review and describe the Bush Administration's email for the first time. In the next section, a method is described for recognizing the documentary form of records created by office applications such as word processors, spreadsheets and database management systems. Then it is shown how the ability to automatically recognize document type enables the automatic description of items, file units and record series. Finally, how these technologies can aid archivists in appraising e-records and gaining intellectual control of accessioned e-records is discussed. .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar-Based Recognition of Documentary Forms and Extraction of Metadata

Metadata extraction is a critical aspect of ingestion of collections into digital archives and libraries. A method for automatically recognizing document types and extracting metadata from digital records has been developed. The method is based on a method for automatically annotating semantic categories such as person’s names, job titles, dates, and postal addresses that may occur in a record....

متن کامل

Planned Focus-on-form Instruction in Task-based Language Teaching: The case of EFL learners’ oral grammatical accuracy performance

This study investigated the effects of planned focus-on-form instruction (pFFI) on developing oral grammatical accuracy in Iranian English EFL learners. To this end, 60 lower-intermediate EFL learners studying English in a private English language institute in Tehran, Iran, were randomly assigned to two classes. Both classes received a task-based instruction on grammatical points elicited in or...

متن کامل

Types of Grammatical Metaphors in Harry Potter and the Prisoner of Azkaban

Grammatical Metaphor (GM) is one of the fresh language phenomena introduced by Halliday (1985) in the framework of functional grammar. Thompson (2004) states that the salient source of GM would be ‘Nominalization’ where a noun form attempts to represent a verb form or in other words, a verb form with its different process is represented in a noun form. He continues that any wording is ought to ...

متن کامل

Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language

Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...

متن کامل

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007